{
  "cells": [
    {
      "cell_type": "markdown",
      "id": "181b5b6d",
      "metadata": {},
      "source": [
        "# How to parse XML output\n",
        "\n",
        ":::info Prerequisites\n",
        "\n",
        "This guide assumes familiarity with the following concepts:\n",
        "- [Chat models](/docs/concepts/chat_models)\n",
        "- [Output parsers](/docs/concepts/output_parsers)\n",
        "- [Prompt templates](/docs/concepts/prompt_templates)\n",
        "- [Structured output](/docs/how_to/structured_output)\n",
        "- [Chaining runnables together](/docs/how_to/sequence/)\n",
        "\n",
        ":::\n",
        "\n",
        "LLMs from different providers often have different strengths depending on the specific data they are trianed on. This also means that some may be \"better\" and more reliable at generating output in formats other than JSON.\n",
        "\n",
        "This guide shows you how to use the [`XMLOutputParser`](https://api.js.langchain.com/classes/langchain_core.output_parsers.XMLOutputParser.html) to prompt models for XML output, then and parse that output into a usable format.\n",
        "\n",
        ":::{.callout-note}\n",
        "Keep in mind that large language models are leaky abstractions! You'll have to use an LLM with sufficient capacity to generate well-formed XML.\n",
        ":::\n",
        "\n",
        "In the following examples, we use Anthropic's Claude (https://docs.anthropic.com/claude/docs), which is one such model that is optimized for XML tags.\n",
        "\n",
        "```{=mdx}\n",
        "import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n",
        "import Npm2Yarn from \"@theme/Npm2Yarn\";\n",
        "\n",
        "<IntegrationInstallTooltip></IntegrationInstallTooltip>\n",
        "\n",
        "<Npm2Yarn>\n",
        "  @langchain/anthropic @langchain/core\n",
        "</Npm2Yarn>\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "da312f86-0d2a-4aef-a09d-1e72bd0ea9b1",
      "metadata": {},
      "source": [
        "Let's start with a simple request to the model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "id": "b03785af-69fc-40a1-a1be-c04ed6fade70",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Here is the shortened filmography for Tom Hanks, with movies enclosed in \"movie\" tags:\n",
            "\n",
            "<movie>Forrest Gump</movie>\n",
            "<movie>Saving Private Ryan</movie>\n",
            "<movie>Cast Away</movie>\n",
            "<movie>Apollo 13</movie>\n",
            "<movie>Catch Me If You Can</movie>\n",
            "<movie>The Green Mile</movie>\n",
            "<movie>Toy Story</movie>\n",
            "<movie>Toy Story 2</movie>\n",
            "<movie>Toy Story 3</movie>\n",
            "<movie>Toy Story 4</movie>\n",
            "<movie>Philadelphia</movie>\n",
            "<movie>Big</movie>\n",
            "<movie>Sleepless in Seattle</movie>\n",
            "<movie>You've Got Mail</movie>\n",
            "<movie>The Terminal</movie>\n"
          ]
        }
      ],
      "source": [
        "import { ChatAnthropic } from \"@langchain/anthropic\";\n",
        "\n",
        "const model = new ChatAnthropic({\n",
        "  model: \"claude-3-sonnet-20240229\",\n",
        "  maxTokens: 512,\n",
        "  temperature: 0.1,\n",
        "});\n",
        "\n",
        "const query = `Generate the shortened filmograph for Tom Hanks.`;\n",
        "\n",
        "const result = await model.invoke(query + ` Please enclose the movies in \"movie\" tags.`);\n",
        "\n",
        "console.log(result.content);"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "4db65781-3d54-4ba6-ae26-5b4ead47a4c8",
      "metadata": {},
      "source": [
        "This actually worked pretty well! But it would be nice to parse that XML into a more easily usable format. We can use the `XMLOutputParser` to both add default format instructions to the prompt and parse outputted XML into a dict:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "id": "6917e057",
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "\u001b[32m\"The output should be formatted as a XML file.\\n\"\u001b[39m +\n",
              "  \u001b[32m\"1. Output should conform to the tags below. \\n\"\u001b[39m +\n",
              "  \u001b[32m\"2. If tag\"\u001b[39m... 434 more characters"
            ]
          },
          "execution_count": 4,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "import { XMLOutputParser } from \"@langchain/core/output_parsers\";\n",
        "\n",
        "// We will add these instructions to the prompt below\n",
        "const parser = new XMLOutputParser();\n",
        "\n",
        "parser.getFormatInstructions();"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "id": "87ba8d11",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "{\n",
            "  \"filmography\": [\n",
            "    {\n",
            "      \"actor\": [\n",
            "        {\n",
            "          \"name\": \"Tom Hanks\"\n",
            "        },\n",
            "        {\n",
            "          \"films\": [\n",
            "            {\n",
            "              \"film\": [\n",
            "                {\n",
            "                  \"title\": \"Forrest Gump\"\n",
            "                },\n",
            "                {\n",
            "                  \"year\": \"1994\"\n",
            "                },\n",
            "                {\n",
            "                  \"role\": \"Forrest Gump\"\n",
            "                }\n",
            "              ]\n",
            "            },\n",
            "            {\n",
            "              \"film\": [\n",
            "                {\n",
            "                  \"title\": \"Saving Private Ryan\"\n",
            "                },\n",
            "                {\n",
            "                  \"year\": \"1998\"\n",
            "                },\n",
            "                {\n",
            "                  \"role\": \"Captain Miller\"\n",
            "                }\n",
            "              ]\n",
            "            },\n",
            "            {\n",
            "              \"film\": [\n",
            "                {\n",
            "                  \"title\": \"Cast Away\"\n",
            "                },\n",
            "                {\n",
            "                  \"year\": \"2000\"\n",
            "                },\n",
            "                {\n",
            "                  \"role\": \"Chuck Noland\"\n",
            "                }\n",
            "              ]\n",
            "            },\n",
            "            {\n",
            "              \"film\": [\n",
            "                {\n",
            "                  \"title\": \"Catch Me If You Can\"\n",
            "                },\n",
            "                {\n",
            "                  \"year\": \"2002\"\n",
            "                },\n",
            "                {\n",
            "                  \"role\": \"Carl Hanratty\"\n",
            "                }\n",
            "              ]\n",
            "            },\n",
            "            {\n",
            "              \"film\": [\n",
            "                {\n",
            "                  \"title\": \"The Terminal\"\n",
            "                },\n",
            "                {\n",
            "                  \"year\": \"2004\"\n",
            "                },\n",
            "                {\n",
            "                  \"role\": \"Viktor Navorski\"\n",
            "                }\n",
            "              ]\n",
            "            }\n",
            "          ]\n",
            "        }\n",
            "      ]\n",
            "    }\n",
            "  ]\n",
            "}\n"
          ]
        }
      ],
      "source": [
        "import { ChatPromptTemplate } from \"@langchain/core/prompts\";\n",
        "\n",
        "const prompt = ChatPromptTemplate.fromTemplate(`{query}\\n{format_instructions}`);\n",
        "const partialedPrompt = await prompt.partial({\n",
        "  format_instructions: parser.getFormatInstructions(),\n",
        "});\n",
        "\n",
        "const chain = partialedPrompt.pipe(model).pipe(parser);\n",
        "\n",
        "const output = await chain.invoke({\n",
        "  query: \"Generate the shortened filmograph for Tom Hanks.\",\n",
        "});\n",
        "\n",
        "console.log(JSON.stringify(output, null, 2));"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "327f5479-77e0-4549-8393-2cd7a286d491",
      "metadata": {},
      "source": [
        "You'll notice above that our output is no longer just between `movie` tags. We can also add some tags to tailor the output to our needs:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "id": "4af50494",
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "\u001b[32m\"The output should be formatted as a XML file.\\n\"\u001b[39m +\n",
              "  \u001b[32m\"1. Output should conform to the tags below. \\n\"\u001b[39m +\n",
              "  \u001b[32m\"2. If tag\"\u001b[39m... 460 more characters"
            ]
          },
          "execution_count": 8,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "const parserWithTags = new XMLOutputParser({ tags: [\"movies\", \"actor\", \"film\", \"name\", \"genre\"] });\n",
        "\n",
        "// We will add these instructions to the prompt below\n",
        "parserWithTags.getFormatInstructions();"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "6563ca36",
      "metadata": {},
      "source": [
        "You can and should experiment with adding your own formatting hints in the other parts of your prompt to either augment or replace the default instructions.\n",
        "\n",
        "Here's the result when we invoke it:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "id": "b722a235",
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "{\n",
            "  \"movies\": [\n",
            "    {\n",
            "      \"actor\": [\n",
            "        {\n",
            "          \"film\": [\n",
            "            {\n",
            "              \"name\": \"Forrest Gump\"\n",
            "            },\n",
            "            {\n",
            "              \"genre\": \"Drama\"\n",
            "            }\n",
            "          ]\n",
            "        },\n",
            "        {\n",
            "          \"film\": [\n",
            "            {\n",
            "              \"name\": \"Saving Private Ryan\"\n",
            "            },\n",
            "            {\n",
            "              \"genre\": \"War\"\n",
            "            }\n",
            "          ]\n",
            "        },\n",
            "        {\n",
            "          \"film\": [\n",
            "            {\n",
            "              \"name\": \"Cast Away\"\n",
            "            },\n",
            "            {\n",
            "              \"genre\": \"Drama\"\n",
            "            }\n",
            "          ]\n",
            "        },\n",
            "        {\n",
            "          \"film\": [\n",
            "            {\n",
            "              \"name\": \"Catch Me If You Can\"\n",
            "            },\n",
            "            {\n",
            "              \"genre\": \"Biography\"\n",
            "            }\n",
            "          ]\n",
            "        },\n",
            "        {\n",
            "          \"film\": [\n",
            "            {\n",
            "              \"name\": \"The Terminal\"\n",
            "            },\n",
            "            {\n",
            "              \"genre\": \"Comedy-drama\"\n",
            "            }\n",
            "          ]\n",
            "        }\n",
            "      ]\n",
            "    }\n",
            "  ]\n",
            "}\n"
          ]
        }
      ],
      "source": [
        "import { ChatPromptTemplate } from \"@langchain/core/prompts\";\n",
        "\n",
        "const promptWithTags = ChatPromptTemplate.fromTemplate(`{query}\\n{format_instructions}`);\n",
        "const partialedPromptWithTags = await promptWithTags.partial({\n",
        "  format_instructions: parserWithTags.getFormatInstructions(),\n",
        "});\n",
        "\n",
        "const chainWithTags = partialedPromptWithTags.pipe(model).pipe(parserWithTags);\n",
        "\n",
        "const outputWithTags = await chainWithTags.invoke({\n",
        "  query: \"Generate the shortened filmograph for Tom Hanks.\",\n",
        "});\n",
        "\n",
        "console.log(JSON.stringify(outputWithTags, null, 2));"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "6902fe6f",
      "metadata": {},
      "source": [
        "## Next steps\n",
        "\n",
        "You've now learned how to prompt a model to return XML. Next, check out the [broader guide on obtaining structured output](/docs/how_to/structured_output) for other related techniques."
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Deno",
      "language": "typescript",
      "name": "deno"
    },
    "language_info": {
      "file_extension": ".ts",
      "mimetype": "text/x.typescript",
      "name": "typescript",
      "nb_converter": "script",
      "pygments_lexer": "typescript",
      "version": "5.3.3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}